Dataset Description
A total of 26,890 individuals from 8,698 families were genotyped on the GSA-24v1-0_A1.
- 15,138 males, 11,752 females.
- Individuals in a single family ranged from 2 to 10!
- 634,709 SNPs were included in the genotype files.
- Note that coordinates were based on Build38.
Raw Genotype QC
Sex Check
- Based on 6,995 QCd (--geno 0.05 --maf 0.01 --hwe 1e-6 --mind 0.1) CHR-X SNPs.
- 147 PRROBLEM
- 130 ambiguous SNPSEX (close to 0.2/0.8)
- 17 with SNPSEX different from PEDSEX (Need further explorations)
ChrX F distributions

Pariwise IBD estimation
- Relationships (RT): OT (Others), FS (Full Siblings), HS (Half Siblings), PO (Parent Offspring)
- IBS sharing for other pairs, ranging
- from 0.20 to 1.00 in FS,
- from 0.45 to 0.64 in PO,
- from 0.00 to 0.55 in OT
- indicating inbreeding between some parents and possible relatives between families as multiplex included.
Estimated pairwise IBD distributions

Individual genome-wide heterozygosity
Genome-wide heterozygosity VS missing rates

Genome-wide F VS missing rates

Imputation
Pre-imputation
The imputation pipeline follows that used for SSC dataset. A total of 26867 individuals and ~400K autosomal, ~7K chrX SNPs were used for further impution.
- filters: --geno 0.05 --mind 0.1 --maf 0.01 --hwe 1e-6
- 23 people removed due to missing genotype data (–mind)..
- Total genotyping rate in remaining samples is 0.981795.
- 41406 variants removed due to missing genotype data (–geno).
- 62974 variants removed due to Hardy-Weinberg exact test.